Spectral concentration and greedy k-clustering
نویسندگان
چکیده
A popular graph clustering method is to consider the embedding of an input graph into R induced by the first k eigenvectors of its Laplacian, and to partition the graph via geometric manipulations on the resulting metric space. Despite the practical success of this methodology, there is limited understanding of several heuristics that follow this framework. We provide theoretical justification for one such natural and computationally efficient variant. Our result can be summarized as follows. A partition of a graph is called strong if each cluster has small external conductance, and large internal conductance. We present a simple greedy spectral clustering algorithm which returns a partition that is provably close to a suitably strong partition, provided that such a partition exists. A recent result shows that strong partitions exist for graphs with a sufficiently large spectral gap between the k-th and (k + 1)-st eigenvalues. Taking this together with our main theorem gives a spectral algorithm which finds a partition close to a strong one for graphs with large enough spectral gap. We also show how this simple greedy algorithm can be implemented in near-linear time for any fixed k and error guarantee. Finally, we evaluate our algorithm on some real-world and synthetic inputs.
منابع مشابه
Using Greedy Clustering Method to Solve Capacitated Location-Routing Problem with Fuzzy Demands
Using Greedy Clustering Method to Solve Capacitated Location-Routing Problem with Fuzzy Demands Abstract In this paper, the capacitated location routing problem with fuzzy demands (CLRP_FD) is considered. In CLRP_FD, facility location problem (FLP) and vehicle routing problem (VRP) are observed simultaneously. Indeed the vehicles and the depots have a predefined capacity to serve the customerst...
متن کاملClustering for Data Reduction: A Divide and Conquer Approach
We consider the problem of reducing a potentially very large dataset to a subset of representative prototypes. Rather than searching over the entire space of prototypes, we first roughly divide the data into balanced clusters using bisecting k-means and spectral cuts, and then find the prototypes for each cluster by affinity propagation. We apply our algorithm to text data, where we perform an ...
متن کاملThe Comparison of Top Leaders Algorithm and Other Algorithms
In this project I explored the Top Leaders algorithm [1], and compared it with several other community discovery algorithms. Community discovery is an important and interesting research field in the analytics of social network. By detecting communities in a social network, companies can adopt different marketing strategies and recommend different products for people in different communities, or...
متن کاملA Complex Networks Approach for Data Clustering
Many methods have been developed for data clustering, such as k-means, expectation maximization and algorithms based on graph theory. In this latter case, graphs are generally constructed by taking into account the Euclidian distance as a similarity measure, and partitioned using spectral methods. However, these methods are not accurate when the clusters are not well separated. In addition, it ...
متن کاملTight Continuous Relaxation of the Balanced k-Cut Problem
Spectral Clustering as a relaxation of the normalized/ratio cut has become one of the standard graph-based clustering methods. Existing methods for the computation of multiple clusters, corresponding to a balanced k-cut of the graph, are either based on greedy techniques or heuristics which have weak connection to the original motivation of minimizing the normalized cut. In this paper we propos...
متن کامل